Goto

Collaborating Authors

 cover time


On Statistical Estimation of Edge-Reinforced Random Walks

Qinghua, null, Ding, null, Anantharam, Venkat

arXiv.org Machine Learning

Reinforced random walks (RRWs), including vertex-reinforced random walks (VRRWs) and edge-reinforced random walks (ERRWs), model random walks where the transition probabilities evolve based on prior visitation history~\cite{mgr, fmk, tarres, volkov}. These models have found applications in various areas, such as network representation learning~\cite{xzzs}, reinforced PageRank~\cite{gly}, and modeling animal behaviors~\cite{smouse}, among others. However, statistical estimation of the parameters governing RRWs remains underexplored. This work focuses on estimating the initial edge weights of ERRWs using observed trajectory data. Leveraging the connections between an ERRW and a random walk in a random environment (RWRE)~\cite{mr, mr2}, as given by the so-called "magic formula", we propose an estimator based on the generalized method of moments. To analyze the sample complexity of our estimator, we exploit the hyperbolic Gaussian structure embedded in the random environment to bound the fluctuations of the underlying random edge conductances.


Revisiting Random Walks for Learning on Graphs

Kim, Jinwoo, Zaghen, Olga, Suleymanzade, Ayhan, Ryou, Youngmin, Hong, Seunghoon

arXiv.org Artificial Intelligence

We revisit a simple idea for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We refer to these stochastic machines as random walk neural networks, and show that we can design them to be isomorphism invariant while capable of universal approximation of graph functions in probability. A useful finding is that almost any kind of record of random walk guarantees probabilistic invariance as long as the vertices are anonymized. This enables us to record random walks in plain text and adopt a language model to read these text records to solve graph tasks. We further establish a parallelism to message passing neural networks using tools from Markov chain theory, and show that over-smoothing in message passing is alleviated by construction in random walk neural networks, while over-squashing manifests as probabilistic under-reaching. We show that random walk neural networks based on pre-trained language models can solve several hard problems on graphs, such as separating strongly regular graphs where the 3-WL test fails, counting substructures, and transductive classification on arXiv citation network without training.


Online Submodular Set Cover, Ranking, and Repeated Active Learning

Neural Information Processing Systems

We propose an online prediction version of submodular set cover with connections to ranking and repeated active learning. In each round, the learning algorithm chooses a sequence of items. The algorithm then receives a monotone submodular function and suffers loss equal to the cover time of the function: the number of items needed, when items are selected in order of the chosen sequence, to achieve a coverage constraint. We develop an online learning algorithm whose loss converges to approximately that of the best sequence in hindsight. Our proposed algorithm is readily extended to a setting where multiple functions are revealed at each round and to bandit and contextual bandit settings.


Optimal Decision Tree with Noisy Outcomes

Jia, Su, Navidi, Fatemeh, Nagarajan, Viswanath, Ravi, R.

arXiv.org Machine Learning

In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.e., decision tree) that identifies the true hypothesis. This optimization problem has been extensively studied under the assumption that each test generates a deterministic outcome. However, in numerous applications, for example, clinical trials, the outcomes may be uncertain, which renders the ideas from the deterministic setting invalid. In this work, we study a fundamental variant of the ODT problem in which some test outcomes are noisy, even in the more general case where the noise is persistent, i.e., repeating a test gives the same noisy output. Our approximation algorithms provide guarantees that are nearly best possible and hold for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades continuously with this number. We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers, and observed that our algorithms have costs very close to the information-theoretic minimum.


A Cover Time Study of a non-Markovian Algorithm

Fang, Guanhua, Samorodnitsky, Gennady, Xu, Zhiqiang

arXiv.org Artificial Intelligence

Given a traversal algorithm, cover time is the expected number of steps needed to visit all nodes in a given graph. A smaller cover time means a higher exploration efficiency of traversal algorithm. Although random walk algorithms have been studied extensively in the existing literature, there has been no cover time result for any non-Markovian method. In this work, we stand on a theoretical perspective and show that the negative feedback strategy (a count-based exploration method) is better than the naive random walk search. In particular, the former strategy can locally improve the search efficiency for an arbitrary graph. It also achieves smaller cover times for special but important graphs, including clique graphs, tree graphs, etc. Moreover, we make connections between our results and reinforcement learning literature to give new insights on why classical UCB and MCTS algorithms are so useful. Various numerical results corroborate our theoretical findings.


Approximation Algorithms for Active Sequential Hypothesis Testing

Gan, Kyra, Jia, Su, Li, Andrew

arXiv.org Machine Learning

In the problem of active sequential hypotheses testing (ASHT), a learner seeks to identify the true hypothesis $h^*$ from among a set of hypotheses $H$. The learner is given a set of actions and knows the outcome distribution of any action under any true hypothesis. While repeatedly playing the entire set of actions suffices to identify $h^*$, a cost is incurred with each action. Thus, given a target error $\delta>0$, the goal is to find the minimal cost policy for sequentially selecting actions that identify $h^*$ with probability at least $1 - \delta$. This paper provides the first approximation algorithms for ASHT, under two types of adaptivity. First, a policy is partially adaptive if it fixes a sequence of actions in advance and adaptively decides when to terminate and what hypothesis to return. Under partial adaptivity, we provide an $O\big(s^{-1}(1+\log_{1/\delta}|H|)\log (s^{-1}|H| \log |H|)\big)$-approximation algorithm, where $s$ is a natural separation parameter between the hypotheses. Second, a policy is fully adaptive if action selection is allowed to depend on previous outcomes. Under full adaptivity, we provide an $O(s^{-1}\log (|H|/\delta)\log |H|)$-approximation algorithm. We numerically investigate the performance of our algorithms using both synthetic and real-world data, showing that our algorithms outperform a previously proposed heuristic policy.


Discovering Options for Exploration by Minimizing Cover Time

Jinnai, Yuu, Park, Jee Won, Abel, David, Konidaris, George

arXiv.org Artificial Intelligence

Finding a set of edges that minimizes expected One of the main challenges in reinforcement learning cover time is an extremely hard combinatorial optimization is solving tasks with sparse reward. We show problem (Braess, 1968; Braess et al., 2005). Thus, our that the difficulty of discovering a distant rewarding algorithm instead seeks to minimize the upper bound of the state in an MDP is bounded by the expected expected cover time given as a function of the algebraic cover time of a random walk over the graph induced connectivity of the graph Laplacian (Fiedler, 1973; Broder by the MDP's transition dynamics. We & Karlin, 1989; Chung, 1996) using the heuristic method therefore propose to accelerate exploration by constructing by Ghosh & Boyd (2006) that improves the upper bound of options that minimize cover time. The the expected cover time of a uniform random walk.


Online Submodular Set Cover, Ranking, and Repeated Active Learning

Guillory, Andrew, Bilmes, Jeff A.

Neural Information Processing Systems

We propose an online prediction version of submodular set cover with connections to ranking and repeated active learning. In each round, the learning algorithm chooses a sequence of items. The algorithm then receives a monotone submodular function and suffers loss equal to the cover time of the function: the number of items needed, when items are selected in order of the chosen sequence, to achieve a coverage constraint. We develop an online learning algorithm whose loss converges to approximately that of the best sequence in hindsight. Our proposed algorithm is readily extended to a setting where multiple functions are revealed at each round and to bandit and contextual bandit settings.